On the Consistency of Bayesian Variable Selection for High Dimensional Binary Regression and Classification

نویسنده

  • Wenxin Jiang
چکیده

Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. We use a prior to select a limited number of candidate variables to enter the model, applying a popular method with selection indicators. We show that this approach can induce posterior estimates of the regression functions that are consistently estimating the truth, if the true regression model is sparse in the sense that the aggregated size of the regression coefficients are bounded. The estimated regression functions therefore can also produce consistent classifiers that are asymptotically optimal for predicting future binary outputs. These provide theoretical justifications for some recent empirical successes in microarray data analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market

Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...

متن کامل

Bayesian Variable Selection for High Dimensional Generalized Linear Models: Convergence Rates of the Fitted Densities By

Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1, . . . , xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj ’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by...

متن کامل

Bayesian Variable Selection for High Dimensional Generalized Linear Models: Convergence Rates of the Fitted Densities

Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1, . . . , xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj ’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by...

متن کامل

Classification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset

Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is ...

متن کامل

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural computation

دوره 18 11  شماره 

صفحات  -

تاریخ انتشار 2006